An efficient <i>k</i> ‐modes algorithm for clustering categorical datasets

نویسندگان

چکیده

Mining clusters from data is an important endeavor in many applications. The $k$-means method a popular, efficient, and distribution-free approach for clustering numerical-valued data, but does not apply categorical-valued observations. $k$-modes addresses this lacuna by replacing the Euclidean with Hamming distance means modes objective function. We provide novel, computationally efficient implementation of $k$-modes, called OTQT. prove that OTQT finds updates to improve function are undetectable existing algorithms. Although slightly slower per iteration due algorithmic complexity, always more accurate almost faster (and only barely on some datasets) final optimum. Thus, we recommend as preferred, default algorithm optimization.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Optimization K-Modes Clustering Algorithm with Elephant Herding Optimization Algorithm for Crime Clustering

The detection and prevention of crime, in the past few decades, required several years of research and analysis. However, today, thanks to smart systems based on data mining techniques, it is possible to detect and prevent crime in a considerably less time. Classification and clustering-based smart techniques can classify and cluster the crime-related samples. The most important factor in the c...

متن کامل

K-Histograms: An Efficient Clustering Algorithm for Categorical Dataset

Clustering categorical data is an integral part of data mining and has attracted much attention recently. In this paper, we present k-histogram, a new efficient algorithm for clustering categorical data. The k-histogram algorithm extends the k-means algorithm to categorical domain by replacing the means of clusters with histograms, and dynamically updates histograms in the clustering process. E...

متن کامل

A fuzzy k-modes algorithm for clustering categorical data

This correspondence describes extensions to the fuzzy k-means algorithm for clustering categorical data. By using a simple matching dissimilarity measure for categorical objects and modes instead of means for clusters, a new approach is developed, which allows the use of the k-means paradigm to efficiently cluster large categorical data sets. A fuzzy k-modes algorithm is presented and the effec...

متن کامل

A genetic fuzzy k-Modes algorithm for clustering categorical data

The fuzzy k-Modes algorithm introduced by Huang and Ng [Huang, Z., & Ng, M. (1999). A fuzzy k-modes algorithm for clustering categorical data. IEEE Transactions on Fuzzy Systems, 7(4), 446–452] is very effective for identifying cluster structures from categorical data sets. However, the algorithm may stop at locally optimal solutions. In order to search for appropriate fuzzy membership matrices...

متن کامل

CBK-Modes: A Correlation-based Algorithm for Categorical Data Clustering

Categorical data sets are often high-dimensional. For handling the high-dimensionality in the clustering process, some works take advantage of the fact that clusters usually occur in a subspace. In soft subspace clustering approaches, different weights are assigned to each attribute in each cluster, for measuring their respective contributions to the formation of each cluster. In this paper, we...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Statistical Analysis and Data Mining

سال: 2021

ISSN: ['1932-1864', '1932-1872']

DOI: https://doi.org/10.1002/sam.11546